High data rate communication system for wireless applications

ABSTRACT

A decoder for a communication system includes an iterative decoding module configured to receive soft-input information bits. The iterative decoding module iterates on probability estimates of the soft-input information bits to generate hard-decision output information. The iterative decoding module includes a plurality of arithmetic modules operating to generate and process both backward and forward metrics substantially simultaneously using modulo arithmetic operations.

BACKGROUND

1. Field

The present invention relates generally to wireless communicationsystems and, more particularly, to high data rate wireless applications.

2. Description of the Related Art

Fourth generation (4G) wireless applications will require more advancederror correcting techniques than previous wireless applications. The 4Gerror correcting techniques will enable reliable data transmission andgreater data rates at lower channel signal-to-noise ratio (SNR). A classof forward error correction codes, referred to as turbo codes, offerssignificant coding gain for power-limited communication channels. Turbocodes are generated by using two or more recursive systematicconvolutional (RSC) encoders operating on different orderings of thesame information bits. A subset of the code bits generated by eachencoder is transmitted to maintain bandwidth efficiency. Turbo decodinginvolves an iterative technique in which probability estimates of theinformation bits that are derived for one of the decoded code words arefed back to a probability estimator for a second one of the code words.Each iteration of processing generally increases the reliability of theprobability estimates. This process continues, alternately decoding thetwo code words until the probability estimates are sufficient to makereliable decisions about the original information bits.

Prior to turbo codes, the commonly used error correcting codes wereconvolutional encoders paired with Viterbi decoders. Convolutional codesare capable of allowing a communication system to reach the Shannonlimit, which is the theoretical limit of SNR for error freecommunication over a given noisy channel. Viterbi decoders, however,grow exponentially in complexity as their error correction capability isincreased, making their practical limit 3 dB to 6 dB away from theShannon limit for practical hardware/software implementations. Incontrast to convolutional codes, turbo codes implemented with apractical decoder have been shown to achieve a performance of 0.7 dBfrom the Shannon limit, far surpassing the performance of aconvolutional-encoder/Viterbi-decoder of similar complexity. Turbodecoding techniques have not yet reached the maturity and openavailability that Viterbi techniques enjoy, so implementing a turbo codeis not a trivial exercise. A 3G standard written by an industry groupcalled the Third Generation Partnership Project (3GPP) specifies thedesign of the turbo encoder in great detail but does not specify thedecoder design, leaving that choice up to the designer.

A maximum a posteriori (MAP) decoding technique introduced by Bahl,Cocke, Jelinick, and Raviv in “Optimal Decoding of Linear Codes forMinimizing Symbol Error Rate”, IEEE Transactions on Information Theory,March 1974, pp. 284-287, is a symbol-by-symbol decoder for trelliscodes. The MAP technique delivers excellent performance as a componentdecoder in decoding turbo codes. The technique is advantageous fordecoding turbo codes because it accepts soft-decision information as aninput and produces soft-decision output information.

The MAP technique can be used in a turbo decoder to generate aposteriori probability estimates of the systematic bits (i.e.,information bits) in a first iteration of decoding the code word. Theseprobability estimates are used as a priori symbol probabilities in asecond iteration. Those skilled in the art will recognize threefundamental terms in descriptions of the MAP technique, which are:forward and backward state probability functions (the alpha and betafunctions, respectively) and the a posteriori transition probabilities(the sigma function).

One problem with the MAP technique for the turbo decoder is that arelatively large amount of memory is required. For example, the entirereceived code word must be stored during decoding due to the nature ofthe MAP technique. Furthermore, in order to obtain high-speed decoding,it is necessary to store a large number of intermediate results thatrepresent various event probabilities of interest so they can becombined with other results later in the decoding process. The MAPtechnique as described by Bahl et al. requires that at least half of theresults from the two recursive calculations be stored in memory for fastdecoding. Such requirements can limit the decoding computation speed andcan tax system memory resources.

Therefore, it is desirable to reduce the time for computation and memoryrequired in turbo decoding without compromising coding gain. Thus, thereis a need for improved turbo decoder design. The present inventionsatisfies this need.

SUMMARY

A decoder for a communication system includes first and second decoderblocks and a decision module. The first decoder block calculates aprobability estimate for each soft-input information bit. The seconddecoder block receives and processes the probability estimate of thesoft-input information bits using modulo arithmetic operations. Thedecision module receives the processed soft-input information bits andgenerates hard-decision output information.

In another aspect, a decoder for a communication system includes aniterative decoding module configured to receive soft-input informationbits. The iterative decoding module iterates on probability estimates ofthe soft-input information bits to generate hard-decision outputinformation. The iterative decoding module includes a plurality ofarithmetic modules that generate and process both backward and forwardmetrics substantially simultaneously using modulo arithmetic operations.

In another aspect, a decoder for a communication system can include asoft-input soft-output (SISO) decoding module. The SISO decoding moduleincludes a first plurality of modules to receive and process soft-inputbackward state metrics using modulo arithmetic. The module also includesa second plurality of modules to receive and process soft-input forwardstate metrics using modulo arithmetic.

In another aspect, a decoding method is applied to soft-inputinformation bits with a backward recursion that is performed using atrellis diagram by computing backward state metrics of each node on thetrellis diagram of the symbol information bits. The backward statemetrics are stored in a storage mechanism. A forward recursion is thenperformed on the trellis diagram by computing forward state metrics ofeach node on the trellis diagram of the symbol block data. Finally, theextrinsic information is calculated.

Other features and advantages of the present invention should beapparent from the following description of the preferred embodiment,which illustrates, by way of example, the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a wireless communication system in whichexemplary embodiments of the invention can be practiced.

FIG. 2 is a block diagram of the transmitter system shown in FIG. 1.

FIG. 3 is a block diagram of an exemplary transmitter system configuredas a particular embodiment of the transmitter system in FIG. 2.

FIG. 4 is a functional block diagram of the turbo encoder shown in FIG.3.

FIG. 5 is a block diagram of the receiver system shown in FIG. 1.

FIG. 6 is a detailed block diagram of the exemplary receiver systemillustrated in FIG. 5.

FIG. 7 is a functional block diagram of the turbo decoder shown in FIG.6.

FIG. 8 shows a trellis diagram with eight states in the vertical axis,and k+3 time intervals along the horizontal axis.

FIG. 9 is a flowchart that illustrates the computations performed by theexemplary MAP decoders of FIG. 7.

FIG. 10 illustrates an exemplary parallel window technique in which thedecoder of FIG. 5 divides a data symbol block into a predeterminednumber of windows.

FIG. 11 illustrates an example of a memory storage reduction techniqueused by the FIG. 7 decoders that reduces the storage requirements forthe backward state metrics.

FIG. 12 shows an alternative example of the memory storage reductiontechnique for storing the backward state metrics of a MAP decoder.

FIG. 13 shows an exemplary interleaving process implemented by thedecoder of FIG. 5.

FIG. 14 is a block diagram of a MAP decoder shown in FIG. 7.

DETAILED DESCRIPTION

Exemplary embodiments are described for a communication system having atransmitter and a receiver such that the receiver includes an efficient,high data-rate turbo decoder for wireless applications. The highdata-rate turbo decoder implements mathematical and approximationtechniques that significantly reduce the required computations and theneed for memory storage. In particular, the turbo decoder uses moduloarithmetic in maximum a posteriori probability (MAP) modules configuredwith double add-compare-select (ACS) circuits. The MAP modules use aparallel window technique to perform backward state metric calculationrecursion and to store the metrics. Forward state metric calculation andthe probability estimate (referred to as extrinsic information)calculation are performed in parallel using the stored backward statemetrics. These techniques increase the speed of computation and reducethe memory required for computation.

FIG. 1 is a block diagram of a wireless communication system 100 inwhich exemplary embodiments of the invention can be practiced. Thecommunication system 100 includes a transmitter system 102 and areceiver system 104, which communicate with each other over a network130 through modems 106, 108, respectively. The modems 106, 108 cantransmit and receive signals and/or share resources through the network130, which can comprise a local area network (LAN), a wide area network(WAN) (e.g., the Internet), or a code division multiple access (CDMA)telecommunications network. The local area network (LAN) can beconfigured as a home or office wireless network designed to communicateand share data and resources between devices, particularly at a highdata rate. Thus, in one embodiment, the modems 106, 108 can operate in ahigh frequency (e.g., 3 to 10 GHz) region with a relatively widebandwidth (e.g., up to 7 GHz). Furthermore, the modems can operate overthe operating bandwidth in a multicarrier configuration.

The transmitter 102 and receiver 104 of the communication system 100 canbe configured as a handheld mobile unit such as a personal digitalassistant (PDA), or can be configured as a mobile station telephone, abase station, or other systems/devices that generate and receive digitalsignals for transmission. The transmitter 102 and receiver 104 can beimplemented in a single device that provides a transceiver. Themodulator 110 and the encoder 112 in the transmitter system 102, and thedemodulator 120 and the decoder 122 in the receiver system 104, includedecoders comprising different embodiments as described in detail below.

FIG. 2 is a block diagram of a transmitter system 102 configured inaccordance with the present invention. Input data 212 is sent to acyclic redundancy check (CRC) generator 202 that generates CRC checksumdata for a predetermined amount of data received. The input data andchecksum data comprise a resulting data block. The resulting data blocksare sent to a turbo encoder 204, which processes the data block andgenerates code symbols that are supplied to a channel interleaver 206.The code symbols typically include a retransmission of the originalinput data (called the systematic symbol), and one or more paritysymbols.

The number of parity symbols transmitted for each systematic symboldepends on the coding rate of the transmission. For a coding rate ofone-half (½), one parity symbol is transmitted for every systematicsymbol, for a total of two symbols generated for each data bit(including CRC) received. For a one-third rate (⅓) turbo coding rate,two parity symbols are generated for each systematic symbol, for a totalof three symbols generated for each data bit received.

The code symbols from the turbo encoder 204 are sent to a channelinterleaver 206, which interleaves the symbol blocks as they arereceived. Typically, the channel interleaver 206 performs block orbit-reversal interleaving. Other types of interleavers may be used asthe channel interleaver 206, so long as the code symbols areinterleaved.

A mapper 208 takes the interleaved code symbols from the channelinterleaver 206 and generates symbol words of predetermined bit widthbased on a mapping scheme. The mapping scheme is described furtherbelow. The symbol words are then applied to a modulator 210 thatgenerates a modulated wave form based on the symbol word received.Typical mapping techniques include BPSK, QPSK, 8-PSK, 16 QAM, and 64QAM. Typical modulation techniques include single carrier modulation,and multi carrier modulation. Various other modulation schemes can beutilized. The modulated waveform is then upconverted for transmission atan RF frequency.

FIG. 3 is a block diagram of an exemplary transmitter system 300configured as a particular embodiment of the transmitter system 102illustrated in FIG. 2. The exemplary transmitter system 300 isconfigured as a multicarrier-spread spectrum (MC-SS) system, where acommunication path having a fixed bandwidth is divided into a number ofsub-bands having different frequencies. The width of the sub-bands ischosen to be small enough to allow the distortion in each sub-band to bemodeled by a single attenuation and phase shift for the band. If thenoise level in each band is known, the volume of data sent in each bandcan be optimized by choosing a symbol set having the maximum number ofsymbols consistent with the available signal to noise ratio of thechannel. By using each sub-band at its maximum capacity, the amount ofdata that can be transmitted in the communication path is maximized.

Multicarrier transmission schemes, such as Orthogonal Frequency DivisionMultiplexing (OFDM), have been proposed and are used for many differenttypes of communication systems, including broadband wireless Local AreaNetworks (LANs). The advantage of such schemes is that highly timedispersive channels can be equalized efficiently to provide transmissiondata rates that, when combined with forward error correction techniques,are capable of approaching the theoretical Shannon limit for a noisychannel. A highly dispersive channel can arise from a combination ofmultiple discrete, delayed, and attenuated signals, particularly inradio frequency environments, or can be an intrinsic property of atransmission medium (such as within a wireline copper-pair or afiber-optic transmission system) where the group delay is a continuousfunction of frequency. Additionally, these types of multicarrier systemsare particularly suited to wide bandwidth applications having high datarates.

The transmitter system 300 includes a turbo encoder 304, an interleaver306, a mapper 308, a spreader 319, and a multicarrier modulator 310,similar to the corresponding modules of the system 102 (FIG. 2). Theexemplary transmitter system 300 further includes a scrambler 302 fordata security. In one embodiment, the scrambler 302 can also perform thefunction of the CRC generator 202 (FIG. 2). The modulator 310 isimplemented with banks of digital filters 311 that make use of inversefast Fourier transforms (IFFT) and includes modules 312, 314 for dataacquisition. The system 300 also includes modules 316, 318 for datatransmission.

In the particular embodiment of FIG. 3, a single data stream to betransmitted over the communication path can be broken into a pluralityof N sub-bands. Each sub-band is transmitted during a systemcommunication cycle. During each communication cycle, the portion of thedata stream to be transmitted is converted to N symbols chosen to matchthe capacity of the various N sub-band channels. Each symbol representsthe amplitude of a corresponding sub-carrier. The time domain signal tobe sent on the communication path is obtained by modulating eachsub-carrier by its corresponding amplitude and then adding the modulatedcarriers to form the signal to be placed in the communication path. Thisoperation is carried out by transforming a vector of N symbols using theIFFT filter 311 to generate N time domain values that are sent insequence on the communication path. At the receiver end of thecommunication path, the N time domain values can be accumulated andtransformed using a fast Fourier transform to recover the original Nsymbols after equalization of the transformed data to correct for theattenuation and phase shifts that occurred in the channels.

For channels with memory, such as fading channels, errors typically comein bursts, rather than being randomly distributed. The interleaver 306is used to reorder a binary sequence in a systematic way to disperse theburst errors, making them easier to correct.

The spreader 319 spreads the symbol word over multiple sub-carriers toachieve frequency diversity. The sub-carrier distance and the number ofsub-carriers are appropriately chosen so that it is unlikely that thesymbol word is completely located in a deep fade. This system isreferred to as a multicarrier-spread spectrum (MC-SS) system.

FIG. 4 is a functional block diagram of a turbo encoder 304 constructedaccording to the invention. In the illustrated embodiment of FIG. 4, theencoder 304 includes two 8-state convolutional encoders (i.e., the“constituent” encoders) 400, 402. Each convolutional encoder includes aplurality of delay elements (indicated by blocks with D designations)that generate polynomial terms used in convolutional equations.

The data bit stream from the scrambler 302 (FIG. 3) is received as inputand enters the first constituent encoder 400 that produces a parity bit(Output 2) for each input bit. The input data bit stream also goesthrough an interleaver 410, which scrambles the bit ordering, and thengoes to the second constituent encoder 402, which produces a parity bit(Output 3) for each input bit. The bit ordering of the interleaverremains the same from frame to frame. For the 3GPP standard, theinterleaver 410 operates on a data frame length that can be in a rangefrom 40 bits to 5114 bits long. The data sent across the channel is theoriginal bit stream (Output 1), the parity bits (Output 2) of the firstconstituent encoder 400, and the parity bits (Output 3) of the secondconstituent encoder 402. Thus, the turbo encoder 304 is a rate ⅓encoder.

FIG. 5 is a block diagram of a receiver system 104 configured inaccordance with the invention. The receiver system 104 includes an RFunit 504 that receives, acquires, down-converts, and filters input RFsignals. A demodulator 506 then demodulates, processes, and digitizesthe down-converted signals. A demapper 508 receives the digitized dataand provides soft decision data to a channel deinterleaver 510. Theturbo decoder 512 decodes the soft decision data from the channeldeinterleaver 510 and supplies the resulting hard decision data to aprocessor or control unit of the receiver system 104, which can checkthe accuracy of the data using the CRC checksum data.

FIG. 6 is a block diagram of an exemplary receiver system 600 configuredas a particular embodiment of the receiver system 104 shown in FIG. 5.The receiver system 600 includes an RF unit 602, a demodulator 604, asymbol demapper 642, a channel deinterleaver 644, and a turbo decoder650, similar to the corresponding modules of the receiver system 104illustrated in FIG. 5. However, the exemplary receiver system 600includes additional modules 620, 622, 630 used for data acquisition andconversion.

In the particular embodiment of FIG. 6, the RF unit 602 includes anumerically controlled oscillator (NCO) 610, a receive filter 612, andan automatic gain control (AGC) module 614. The demodulator 604 isimplemented as a multicarrier-spread spectrum demodulator using a fastFourier transform (FFT) module 632. The demodulator 604 also includes anequalizer 634, a pilot processing module 636, phase compensator 638, amultiplier 640, and a despreader 651. The processing of the modules willbe understood by those skilled in the art.

FIG. 7 is a functional block diagram of a high data-rate turbo decoder650 according to an embodiment of the invention. Because a pure maximumlikelihood decoder for a turbo code is very complex to implement, theexemplary decoder 650 employs an iterative decoding scheme toapproximate the maximum likelihood computation, thereby conservingmemory resources and increasing computation speed.

The high data-rate turbo decoder 650 operates in an iterative fashionwith two decoder blocks 700, 710 corresponding to the two constituentencoders 400, 402 (see FIG. 4). The first decoder block 700 (MAP1) makesan estimate of the probability for each data bit as to whether it is a 1or a 0 by operating on the received data (Input 1 in FIG. 7corresponding to Output 1 in FIG. 4) and parity bits (Input 2 in FIG. 7corresponding to Output 2 in FIG. 4) that were produced by the firstconstituent encoder 400. The received data and parity bits are softvalues from the channel deinterleaver. The estimate of the probability(Extrinsic 1) is then sent to the second decoder block 710 (MAP2),through an interleaver 702, along with the interleaved received data andthe parity bits (Input 3) produced by the second constituent encoder402. The received data (Input 1) is interleaved by an interleaver 704and provided to the second decoder block.

The above-described process of two passes through the decoding techniquefrom MAP1 to MAP2 is considered to be one iteration of the decoder 650and is repeated for a fixed number of iterations, or until some externalmechanism determines that no further iterations will improve the biterror rate (BER) for that frame. For example, the external mechanism cancomprise processing that detects when a sequence of estimations has notchanged below an error threshold value. That is, if the change from oneiteration to the next is below the error threshold, then the iterationsare complete. After all iterations are complete, the original data bitsare recovered by making a hard decision on the last soft output. Thehard decision is made by a decision module 714. The bit output is thenproduced by the decision module 714.

The decoding technique employed within the two decoder blocks 700, 710operates on soft inputs (the deinterleaver outputs and the probabilityestimates) and produces soft outputs. A high data-rate Maximum aposteriori Probability (MAP) decoder (see FIG. 14 for detail) producesgood BER performance for these requirements. Although the illustrationin FIG. 7 shows two MAP decoders, in the preferred embodiment they arephysically implemented as one unitary MAP decoder. Those skilled in theart will understand how to implement a single MAP decoder to function asthe structure illustrated in FIG. 7, by operating as MAP 1 during onecycle and then as MAP2 during another cycle.

The unitary MAP decoder is configured as a trellis decoder, like theViterbi decoder. Accordingly, each constituent trellis encoder in theturbo encoder can be defined by a trellis diagram with eight states inthe vertical axis, and k+3 time intervals along the horizontal axis asshown in FIG. 8. The term k is the length of the interleaver and thethree extra time intervals are needed for the trellis termination(getting the encoder back to state 0). The trellis diagram is simply astate diagram with a time axis. After three time intervals the branchesin the trellis repeat themselves, so only one time slice of the trellisis needed to define the entire trellis.

The computation operations for an exemplary embodiment of a decodingtechnique for a modified unitary MAP decoder are summarized in aflowchart shown in FIG. 9. The illustrated computation correspond to oneiteration of one MAP decoder in the functional block diagram of FIG. 7.

Initially, at the first box 900, a backward recursion is performed onthe trellis by computing backward state metrics (i.e., beta values) foreach node in the trellis diagram. The backward state metric of a node isthe sum of the previous backward state metric (i.e., at the previoustime point) multiplied by the branch metric along each branch from thetwo previous nodes to the current node. The branch metric (gamma value)is the exponential of the trellis distance between the hard encodervalues and the soft received values from the deinterleaver, divided bythe channel noise variance, multiplied by the probability estimate fromthe previous decoder. The computation starts at the end of the trellisdiagram and progresses in the reverse direction. At box 902, thebackward state metrics are stored in a storage mechanism such as arandom access memory (RAM). Various techniques for reducing the storagerequirement for the backward state metrics are discussed in detailbelow.

A forward recursion on the trellis is performed at box 904 by computingthe forward state metrics (i.e., alpha values) for each node in thetrellis diagram. The forward state metrics can be computed in a similarmanner as the backward state metrics. For the forward state metrics,however, the computation starts at the beginning of the trellis diagramand progresses in the forward direction.

Extrinsic information that is to be delivered to the next decoder in theiteration sequence is computed at box 906. Computation of the extrinsicinformation involves computing the log likelihood ratio (LLR) for eachtime point. The LLR value is computed as the sum of the products of thealpha, beta, and gamma values for each branch at a particular time thatis associated with a ‘1’ in the encoder, divided by the sum of theproducts of the alpha, beta, and gamma values for each branch at thesame particular time that is associated with a ‘0’ in the encoder.Finally, the extrinsic information is the LLR value minus the inputprobability estimate.

This sequence of computations is repeated for each iteration by each ofthe two decoders MAP1 and MAP2. After all iterations are completed, thedecoded information bits can be retrieved by examining the sign bit ofthe LLR value. If the sign bit is positive, then the resultant bit is aone. If the sign bit is negative, then the resultant bit is a zero. Thisis because the LLR value is defined to be the logarithm of the ratio ofthe probability that the bit is a one to the probability that the bit isa zero.

Since the conventional MAP design is relatively complex, withcomputations involving exponentials, a real-time operation of the MAPdecoder is usually difficult and better to avoid. Therefore, somesimplifications and approximations can be used to significantly reducethe required computations and provide greater efficiencies. For example,the computations of the MAP decoder technique can be configured tooperate in the log domain. This converts all multiplications toadditions, divisions to subtractions, and eliminates exponential and logcomputations, without affecting the bit error rate (BER) performance.Operating in the log domain also keeps growth of the state metricnumbers to a manageable range. In practice, since the log of the sums ofexponentials are frequently needed, an additional simplification is usedin the preferred embodiment. The simplification involves using theJacobian formula to simplify the log of the sums of exponentials asmax-log. Accordingly, a MAP decoder can be implemented as a max-log-MAPdecoder without significantly affecting the BER performance.

In another exemplary simplification, modulo arithmetic computation isused to obviate the need for scaling and/or normalization. Because thebackward and forward recursions require successive multiplications bynumbers less than one, even 32-bit floating-point numbers will underflowunless they are scaled. Scaling requires additional operations that willslow down the turbo decoder. By using the log domain and the moduloarithmetic computation, no scaling may be needed for 32-bit fixed-pointnumbers. Exemplary computer programs (written in C language) that can beperformed by the MAP decoder for the modulo arithmetic are shown in theAppendix (below). The programs can be implemented by having an 11-bitregister and allowing the result of the arithmetic operation to wraparound or overflow. The Appendix also includes examples of computerprograms illustrating the computation techniques for calculating thebackward state metrics, the forward state metrics, and the extrinsicinformation.

As mentioned above, the backward state metrics should be stored becauseall the previous backward state metrics are needed to compute thecurrent state metrics and the external information. This results in astorage requirement for a large number of state metrics, which leads toan unacceptable cost for most practical interleaver sizes. In accordancewith the preferred embodiment, one of the solutions to the storageproblem is the introduction of the aforementioned parallel windowtechnique.

FIG. 10 illustrates an exemplary parallel window technique in which adata symbol block (N_dbps, which represents one multicarrier symbol) isdivided into a predetermined number of windows (num_windows). In theillustrated embodiment of FIG. 10, the data symbol block includes 2506data bits with 128 bits per window. This results in nineteen windowswith 128 bits and one window with 74 bits. Moreover, since the data bitsare interdependent, parallel windows are configured to overlap. In theillustrated embodiment of FIG. 10, the range of the overlapping is setat 30 bits. However, the size of the window and the range of theoverlapping can be adjusted to provide the best result.

The backward state metric in the parallel window technique is nowinitialized at the end of the window rather than at the end of thesymbol block. The forward state metric is initialized at the beginningof the window. In practice, the computations of the state metricsoverlap into the subsequent/previous windows. The computation of thebackward and forward state metrics in the parallel windows are performedin parallel. Accordingly, the size of the backward state metrics to bestored is significantly reduced.

FIG. 11 shows an example of a memory storage reduction technique forfurther reducing the storage requirements for the backward state metricsof a high data-rate MAP decoder. The technique involves storing only afraction of the state metrics into a memory. In the illustratedembodiment of FIG. 11, only every third metric is stored. The missingmetrics are recalculated when they are needed during the computation ofthe extrinsic information. It has been found that this reduces thememory requirement by approximately 75%.

FIG. 12 shows an alternative example of the memory storage reductiontechnique for storing the backward state metrics of a modified unitaryMAP decoder. In the illustrated embodiment of FIG. 12, only every othermetric is stored. The missing metrics are recalculated when they areneeded during the computation of the extrinsic information. It has beenfound that this reduces the memory requirement by approximately 50%.

Once the symbol block has been divided into parallel windows, the databits for each window can be interleaved as illustrated in FIG. 13. Inthe interleaving process of FIG. 13, the data bits are arranged in aparallel window configuration and written in a row-wise format, but areread in a column-wise format. Therefore, the bit sequence 0, 1, 2, 3, .. . 2505 is interleaved to transmit at another bit sequence as 0, 128,256, . . . , 2304, 2432, 1, 129, 257, . . . , 2305, 2433, 2, 130, . . ., 2431.

Since the last window has a different number of bits from other windows,the interleaving process is somewhat awkward and asymmetrical. Tosimplify the interleaving process, the last window can be separated fromthe rest of the windows. Thus, the interleaving process for the windows0 through 18 becomes rectangular block interleaving. The last window canbe interleaved independently and appended to the result of therectangular block interleaving of the first 19 windows. The resultantinterleaved bit sequence is 0, 128, 256, . . . , 2304, 1, 129, 257, . .. , 2305, 2, 130, . . . , 2431, 2432, 2451, . . . , 2505.

FIG. 14 is a block diagram of a high data-rate MAP decoder 1400constructed in accordance with an embodiment of the invention. In theembodiment of FIG. 14, each trellis stage of the MAP decoder 1400includes eight states (memory=3). Thus, eight decoders will be requiredto process the eight states. Two states of a trellis stage areimplemented in a double-ACS configuration. The term “ACS” refers to theorder of the computations (add, compare, and select) performed in thedecoder. All of the computations are performed using the aforementionedmodulo arithmetic. The modified MAP decoder 1400 of the illustratedembodiment shown in FIG. 14 is a type of soft-input soft-output (SISO)decoder/module.

The double-ACS based decoder 1400 is used for two trellis stages in eachcycle, performing the first stage during the rising edge of a clockcycle and performing the second stage during the falling edge of a clockcycle. A multiplexer 1410 receives and selects appropriate branchmetrics for a particular stage. The modules 1402 perform modulo-add andmodulo-subtract to process branch metrics and forward or backward statemetrics. The results of the modulo arithmetic are compared in themodulo-subtract modules 1404 to perform the MAX function. As describedabove, the MAX function approximates the log of exponentials. Themultiplexers 1406 select backward or forward state metrics.

Use of double-ACS and use of the same hardware for both stages (throughrising-edge and falling edge of the clock) of the turbo decoding processminimize the required hardware resources by minimizing the number ofrequired multiplexers and simplifying the placement of the components.Further, since the computations are performed using modulo arithmetic, anormalization process can be eliminated from the double-ACS hardware.

As described above, a decoder uses mathematical and approximationtechniques to reduce the time for computation and memory otherwiserequired in turbo decoding. In particular, the decoder uses moduloarithmetic in one or more modified maximum a posteriori (MAP) modulesconfigured with double add-compare-select (ACS) circuits. The modifiedMAP module uses a parallel window technique to perform backward statemetric calculation recursion and to store the metrics. Forward statemetric calculation and extrinsic information calculation are performedin parallel using the stored backward state metrics. These techniquesincrease the speed of computation and reduce the memory required forcomputation.

The present invention has been described above in terms of exemplaryembodiments so that an understanding of the present invention can beconveyed. Any embodiment described herein as “exemplary” is notnecessarily to be construed as preferred or advantageous over otherembodiments. Moreover, there are many configurations for a communicationsystem using modified turbo decoding and/or multicarrier processing notspecifically described herein but with which the present invention isapplicable. The present invention should therefore not be seen aslimited to the particular embodiments described herein, but rather, itshould be understood that the present invention has wide applicabilitywith respect to wireless communication systems generally. However, itshould be understood that some aspects of the invention can be practicedin wired communication systems. All modifications, variations, orequivalent arrangements and implementations that are within the scope ofthe attached claims should therefore be considered within the scope ofthe invention.

Appendix

Functions for Modulo-Arithmetic Computation #define METRIC_RANGE   2048//11 bit registers (positive integers) int modulo_sub(int x, int y) {int output; output = x − y; if (output >= METRIC_RANGE)   output =output − METRIC_RANGE; else if (output < 0)   output = output +METRIC_RANGE; return(output); } int modulo_add(int x, int y) { intoutput; output = x + y; if (output >= METRIC_RANGE)   output = output −METRIC_RANGE; else if (output < 0)   output = output + METRIC_RANGE;return(output); } int max_int(int x, int y) { int output; if(modulo_sub(x, y) < METRIC_RANGE/2)   output = x; else   output = y;return(output); }

Calculation of Backward State Metric Recursion (Beta) UsingModulo-Arithmetic #define R_LENGTH    128 #define P_LENGTH    30 //Compute beta: initialization if ((i == num_windows-1) ∥ ((i ==num_windows-2) &&  (N_data − (num_windows-1)*R_LENGTH < P_LENGTH))) {  beta[(beta_window_sz-1)*8 + 0] = 508;   beta[(beta_window_sz-1)*8 + 1]= 254;   beta[(beta_window_sz-1)*8 + 2] = 127;  beta[(beta_window_sz-1)*8 + 3] = 127;   beta[(beta_window_sz-1)*8 + 4]= 0;   beta[(beta_window_sz-1)*8 + 5] = 0;   beta[(beta_window_sz-1)*8 +6] = 127;   beta[(beta_window_sz-1)*8 + 7] = 127; } else {   for (k=0;k<8; k++) beta[(beta_window_sz-1)*8 + k] = 0; } // Compute beta:computation counter = i*R_LENGTH + num_bits + tp_length − 1; for(k=beta_window_sz-2; k>=0; k−−) {   metric_t = modulo_sub(beta[(k+1)*8 +0], metric11[counter]);   metric_b = modulo_add(beta[(k+1)*8 + 4],metric11[counter]);   beta[k*8 + 0] = max_int(metric_t, metric_b);  metric_t = modulo_add(beta[(k+1)*8 + 0], metric11[counter]);  metric_b = modulo_sub(beta[(k+1)*8 + 4], metric11[counter]);  beta[k*8 + 1] = max_int(metric_t, metric_b);   metric_t =modulo_add(beta[(k+1)*8 + 1], metric10[counter]);   metric_b =modulo_sub(beta[(k+1)*8 + 5], metric10[counter]);   beta[k*8 + 2] =max_int(metric_t, metric_b);   metric_t = modulo_sub(beta[(k+1)*8 + 1],metric10[counter]);   metric_b = modulo_add(beta[(k+1)*8 + 5],metric10[counter]);   beta[k*8 + 3] = max_int(metric_t, metric_b);  metric_t = modulo_sub(beta[(k+1)*8 + 2], metric10[counter]);  metric_b = modulo_add(beta[(k+1)*8 + 6], metric10[counter]);  beta[k*8 + 4] = max_int(metric_t, metric_b);   metric_t =modulo_add(beta[(k+1)*8 + 2], metric10[counter]);   metric_b =modulo_sub(beta[(k+1)*8 + 6], metric10[counter]);   beta[k*8 + 5] =max_int(metric_t, metric_b);   metric_t = modulo_add(beta[(k+1)*8 + 3],metric11[counter]);   metric_b = modulo_sub(beta[(k+1)*8 + 7],metric11[counter]);   beta[k*8 + 6] = max_int(metric_t, metric_b);  metric_t = modulo_sub(beta[(k+1)*8 + 3], metric11[counter]);  metric_b = modulo_add(beta[(k+1)*8 + 7], metric11[counter]);  beta[k*8 + 7] = max_int(metric_t, metric_b);   counter−−; }

Calculation of Forward State Metric Recursion (Alpha) and ExtrinsicInformation Using Modulo-Arithmetic // Compute alpha: initializationalpha = alpha_array1; alpha_new = alpha_array2; if (i==0) { alpha[0] =508; alpha[1] = 0; alpha[2] = 127; alpha[3] = 127; alpha[4] = 254;alpha[5] = 0; alpha[6] = 127; alpha[7] = 127; } else { for (k=0; k<8;k++) alpha[k] = 0; } // Compute alpha: computation counter = i*R_LENGTH− hp_length; factor = 0; for (k=0; k<hp_length+num_bits; k++) { metric_t= modulo_sub(alpha[0], metric11[counter]); metric_b =modulo_add(alpha[1], metric11[counter]); alpha_new[0] =max_int(metric_t, metric_b); metric_t = modulo_add(alpha[0],metric11[counter]); metric_b = modulo_sub(alpha[1], metric11[counter]);alpha_new[4] = max_int(metric_t, metric_b); metric_t =modulo_add(alpha[2], metric10[counter]); metric_b = modulo_sub(alpha[3],metric10[counter]); alpha_new[1] = max_int(metric_t, metric_b); metric_t= modulo_sub(alpha[2], metric10[counter]); metric_b =modulo_add(alpha[3], metric10[counter]); alpha_new[5] =max_int(metric_t, metric_b); metric_t = modulo_sub(alpha[4],metric10[counter]); metric_b = modulo_add(alpha[5], metric10[counter]);alpha_new[2] = max_int(metric_t, metric_b); metric_t =modulo_add(alpha[4], metric10[counter]); metric_b = modulo_sub(alpha[5],metric10[counter]); alpha_new[6] = max_int(metric_t, metric_b); metric_t= modulo_add(alpha[6], metric11[counter]); metric_b =modulo_sub(alpha[7], metric11[counter]); alpha_new[3] =max_int(metric_t, metric_b); metric_t = modulo_sub(alpha[6],metric11[counter]); metric_b = modulo_add(alpha[7], metric11[counter]);alpha_new[7] = max_int(metric_t, metric_b); // Compute extrinsicinformation if (k >= hp_length) {   temp_deno =modulo_sub(modulo_add(alpha[0],       beta[factor*8 + 0]),metric11[counter]);   denominator = temp_deno;   temp_num =modulo_add(modulo_add(alpha[0],       beta[factor*8 + 4]),metric11[counter]);   numerator = temp_num;   temp_deno =modulo_sub(modulo_add(alpha[1],       beta[factor*8 + 4]),metric11[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[1],       beta[factor*8 + 0]),metric11[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[2],       beta[factor*8 + 5]),metric10[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[2],       beta[factor*8 + 1]),metric10[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[3],       beta[factor*8 + 1]),metric10[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[3],       beta[factor*8 + 5]),metric10[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[4],       beta[factor*8 + 2]),metric10[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[4],       beta[factor*8 + 6]),metric10[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[5],       beta[factor*8 + 6]),metric10[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[5],       beta[factor*8 + 2]),metric10[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[6],       beta[factor*8 + 7]),metric11[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[6],       beta[factor*8 + 3]),metric11[counter]);   numerator = max_int(temp_num, numerator);  temp_deno = modulo_sub(modulo_add(alpha[7],       beta[factor*8 + 3]),metric11[counter]);   denominator = max_int(temp_deno, denominator);  temp_num = modulo_add(modulo_add(alpha[7],       beta[factor*8 + 7]),metric11[counter]);   numerator = max_int(temp_num, numerator);  sequence1[counter] = modulo_sub(modulo_(—)       sub(numerator,denominator),       out_deinterleave2[counter]);   if(sequence1[counter] >= METRIC_RANGE/2)     sequence1[counter] =sequence1[counter] −           METRIC_RANGE;   if (sequence1[counter] >127)     sequence1[counter] = 127;   else if (sequence1[counter] < −127)    sequence1[counter] = −127;   factor++; } counter++; temp = alpha;alpha = alpha_new; alpha_new = temp; }

1. A decoder for a communication system, the decoder comprising: a firstdecoder block that receives a soft-input information bit for decodingand calculates a probability estimate for the soft-input informationbit; a second decoder block configured to receive and process theprobability estimate of the soft-input information bit using moduloarithmetic operations; and a decision module adapted to receive theprocessed soft-input information bit and to generate hard-decisionoutput information.
 2. A decoder as defined in claim 1, wherein thefirst decoder block includes an output element configured to receive thesoft-input information bit and to generate extrinsic information.
 3. Adecoder as defined in claim 2, further comprising: an interleaverconfigured to interleave the received output extrinsic information, andto direct the interleaved output to the second decoder block.
 4. Adecoder as defined in claim 3, wherein the second decoder block includesa state metric calculator configured to calculate backward and forwardmetrics using the soft-input information bit and the extrinsicinformation.
 5. A decoder as defined in claim 4, further comprising: adeinterleaver configured to deinterleave output of the second decoderblock, and to feed the deinterleaved output back to the first decoderblock.
 6. A decoder for a communication system, the decoder comprising:an iterative decoding module configured to receive soft-inputinformation bits, the iterative decoding module iterating on probabilityestimates of the soft-input information bits to generate soft-decisionoutput information, wherein the iterative decoding module generates andprocesses both backward and forward metrics substantially simultaneouslyusing modulo arithmetic operations; and an output module configured toreceive the soft-decision output information and to generatehard-decision output information.
 7. A decoder as defined in claim 6,wherein the iterative decoding module includes two sub-modules tosimultaneously process first and second stages of trellis decoding.
 8. Adecoder as defined in claim 7, wherein the iterative decoding moduleincludes an output element configured to receive the soft-inputinformation bits and output extrinsic information on the first andsecond stage of trellis decoding.
 9. A decoder as defined in claim 8,further comprising: an interleaver configured to interleave the receivedoutput extrinsic information, and to direct the interleaved output tothe iterative decoding module for the second stage of trellis decoding.10. A decoder as defined in claim 8, further comprising: a deinterleaverconfigured to deinterleave the received output extrinsic information,and to direct the deinterleaved output to the iterative decoding modulefor the first stage of trellis decoding.
 11. A module as defined inclaim 6, wherein the iterative decoding module is a maximum a priori(MAP) decoder.
 12. A receiver for a communication system, comprising: anRF unit configured to receive RF signals and to down-convert the signalto a baseband signal; a demodulator configured to demodulate, process,and digitize the baseband signal into soft-input information bits; andan iterative decoding module configured to receive the soft-inputinformation bits, the iterative decoding module iterating on probabilityestimates of the soft-input information bits to generate hard-decisionoutput information, wherein the iterative decoding module includes aplurality of arithmetic modules operating to generate and process bothbackward and forward metrics substantially simultaneously using moduloarithmetic operations.
 13. A receiver as defined in claim 12, whereinthe demodulator includes a demapper for converting the digitized datainto the soft-input information bits.
 14. A receiver as defined in claim12, wherein the demodulator is a multicarrier demodulator.
 15. Areceiver as defined in claim 14, wherein the multicarrier demodulatorincludes a fast Fourier transform module to calculate Fourier transformsof the baseband signals.
 16. A receiver as defined in claim 12, whereinthe plurality of arithmetic modules includes two sub-modules tosimultaneously process first and second stages of trellis decoding. 17.A soft-input soft-output (SISO) decoding block for a communicationsystem, the block comprising: a first plurality of modules configured toreceive soft-input backward state metrics, the first plurality ofmodules operating to process the backward state metrics using moduloarithmetic; a first multiplexer to select the backward state metrics; asecond plurality of modules configured to receive soft-input forwardstate metrics, the second plurality of modules operating to process theforward state metrics using modulo arithmetic; and a second multiplexerto select the forward state metrics.
 18. A block as defined in claim 17,wherein the SISO block is a maximum a priori (MAP) decoder.
 19. A blockas defined in claim 17, wherein the first plurality of modules includesa first modulo arithmetic comparator to provide a selection signal tothe first multiplexer.
 20. A block as defined in claim 17, wherein thesecond plurality of modules includes a second modulo arithmeticcomparator to provide a selection signal to the second multiplexer. 21.A block as defined in claim 17, further comprising: a third multiplexerconfigured to provide an appropriate branch metrics for the first andsecond plurality of modules.
 22. A block as defined in claim 21, furthercomprising: an input clock configured to synchronize and controlprocessing of the backward and forward state metrics.
 23. A block asdefined in claim 21, wherein the third multiplexer includes a firstsignal element to enable the first plurality of modules at rising edgesof the input clock.
 24. A block as defined in claim 22, wherein thethird multiplexer includes a second signal element to enable the secondplurality of modules at falling edges of the input clock.
 25. A methodof decoding soft-input symbol block data in a communication system,comprising: performing backward recursion of a trellis diagram bycomputing backward state metrics of each node on the trellis diagram ofthe symbol block data; storing the backward state metrics to a storagemechanism; performing forward recursion of the trellis diagram bycomputing forward state metrics of each node on the trellis diagram ofthe symbol block data; and calculating extrinsic information.
 26. Amethod as defined in claim 25, further comprising: computing first andsecond branch metrics along branches from two previous nodes of thetrellis diagram.
 27. A method as defined in claim 26, wherein performingbackward recursion on a trellis diagram includes computing a sumcomprising a previous backward state metric multiplied by a first branchmetric and a previous backward state metric multiplied by a secondbranch metric.
 28. A method as defined in claim 26, wherein calculatingextrinsic information includes computing a log likelihood ratio for eachtime point of the trellis diagram.
 29. A method as defined in claim 26,wherein computing a log likelihood ratio includes: detecting when adecoded bit is determined to be a ‘1’; and computing a first sum of aplurality of products of the backward state metrics, forward statemetrics, and the first and second branch metrics at a particular timethat is associated with a decoded bit representation ‘1’.
 30. A methodas defined in claim 29, wherein computing a log likelihood ratioincludes: detecting when a decoded bit is determined to be a ‘0’; andcomputing a second sum of a plurality of products of the backward statemetrics, forward state metrics, and the first and second branch metricsat a particular time that is associated with a decoded bitrepresentation ‘0’.
 31. A method as defined in claim 30, whereincomputing a log likelihood ratio includes dividing the first sum by thesecond sum.
 32. A method as defined in claim 31, wherein calculatingextrinsic information includes subtracting a probability estimate of thesoft-input symbol block data from the log likelihood ratio.
 33. A methodas defined in claim 32, further comprising: iterating on performingbackward recursion, storing, performing forward recursion, andcalculating extrinsic information.
 34. A method as defined in claim 33,further comprising: retrieving decoded information bits from theiterating by examining a sign bit of the log likelihood ratio.
 35. Amethod as defined in claim 34, further comprising: outputting ahard-decision bit as a ‘1’ if the sign bit is positive; and outputting ahard-decision bit as a ‘0’ if the sign bit is negative.
 36. A method asdefined in claim 25, further comprising: dividing the symbol block datainto a plurality of windows, each window including soft-input bits. 37.A method as defined in claim 36, wherein dividing the symbol block dataincludes allowing overlaps between the plurality of windows.
 38. Amethod as defined in claim 36, further comprising: processing thesoft-input bits of the plurality of windows in parallel.
 39. A method asdefined in claim 38, further comprising: interleaving the soft-inputbits in the plurality of windows.